Learning from biased data using
نویسنده
چکیده
Data bases sometimes contain a non-random sample from the population of interest. This complicates the use of extracted knowledge for predictive purposes. We consider a specific type of biased data that is of considerable practical interest, namely non-random partially classified data. This type of data typically results when some screening mechanism determines whether the correct class of a particular case is known. In credit scoring the problem of learning from such a biased sample is called “reject inference”, since the class label (e.g. good or bad loan) of rejected loan applications is unknown. We show that maximum likelihood estimation of so called mixture models is appropriate for this type of data, and discuss an experiment performed on simulated data using mixtures of normal components. The benefits of this approach are shown by making a comparison with the results of sample-based discriminant analysis. Some directions are given how to extend the analysis to allow for nonnormal components and missing attribute values in order to make it suitable for “real-life” biased data.
منابع مشابه
The Effect of Topic Bias on the Writing Proficiency of Extrovert/Introvert EFL Learners
This study was intended to find out any possible effect of topic bias on the writing proficiency of Iranian extrovert/introvert EFL learners at high/low writing proficiency levels. One hundred participants chosen from among 150 adult language learners on the basis of their personality type (extrovert/introvert) and writing proficiency (high/low) took part in this study. They were arranged into ...
متن کاملStudy of Random Biased d-ary Tries Model
Tries are the most popular data structure on strings. We can construct d-ary tries by using strings over an alphabet leading to d-ary tries. Throughout the paper we assume that strings stored in trie are generated by an appropriate memory less source. In this paper, with a special combinatorial approach we extend their analysis for average profiles to d-ary tries. We use this combinatorial appr...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملDeep-Treat: Learning Optimal Personalized Treatments from Observational Data using Neural Networks
We propose a novel approach for constructing effective treatment policies when the observed data is biased and lacks counterfactual information. Learning in settings where the observed data does not contain all possible outcomes for all treatments is difficult since the observed data is typically biased due to existing clinical guidelines. This is an important problem in the medical domain as c...
متن کاملBuilding a Biased Least Squares Support Vector Machine Classifier for Positive and Unlabeled Learning
Learning from positive and unlabeled examples (PU learning) is a special case of semi-supervised binary classification. The key feature of PU learning is that there is no labeled negative training data, which makes the traditional classification techniques inapplicable. Similar to the idea of Biased-SVM which is one of the most famous classifier, a biased least squares support vector machine cl...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999